This the coursework for MA331 by Speaker Luma Mufleh and Speaker Karen Armstrong.The topics that given by the speaker are:
Luma Mufleh :Don’t feel sorry for refugees – believe in them
Karen Armstrong : My wish: The Charter for Compassion
From the given speaker, we will investigate the sentiment that speaker give, either positive or negative statement, we will do that using several libraries. The aim of the report also measure the sentiment impact that given by the speeker while they are given the speech, there are several question that need to be answered by this report, that is
What the most frequent words that speaker give in their speech?
Is there any same words that spoke by the both of the speaker?
What the most frequent words sentiment that speaker used in the speech?
For the methodology we used several text analytics method such as Tokenization and also we used several method such as Sentiment analysis and Data Visualization to present and interpret the result, The sentiment analysis will used bing dictionary.
## # A tibble: 3 × 5
## talk_id headline text speaker views
## <dbl> <chr> <chr> <chr> <dbl>
## 1 1 Averting the climate crisis "Thank you so much, Chris.… Al Gore 3.27e6
## 2 7 Simplicity sells "(Music: \"The Sound of Si… David … 1.70e6
## 3 53 Greening the ghetto "If you're here today — an… Majora… 2.00e6
From there we can also the data, there are several columns, to give more understanding to the data, we can see the structure of the data
From the data we can see that are several type which all are characters. Next we can do some tokenization which will result to this
## # A tibble: 3 × 5
## talk_id headline speaker views word
## <dbl> <chr> <chr> <dbl> <chr>
## 1 1 Averting the climate crisis Al Gore 3266733 thank
## 2 1 Averting the climate crisis Al Gore 3266733 much
## 3 1 Averting the climate crisis Al Gore 3266733 chris
From the data we already did the tokenization, the tokenization is used to doing some sentiment analysis and also doing some analysis to the words of the data.
After doing some tokenization we can specified into the specific speaker and counting the frequent word that used by the speaker will show the data below
For Luma Mufleh
## # A tibble: 5 × 3
## speaker word n
## <chr> <chr> <int>
## 1 Luma Mufleh one 11
## 2 Luma Mufleh people 11
## 3 Luma Mufleh kids 10
## 4 Luma Mufleh refugee 9
## 5 Luma Mufleh home 8
For Karen Armstrong
## # A tibble: 5 × 3
## speaker word n
## <chr> <chr> <int>
## 1 Karen Armstrong people 23
## 2 Karen Armstrong religion 20
## 3 Karen Armstrong religious 18
## 4 Karen Armstrong world 14
## 5 Karen Armstrong one 13
To Easier and compared the most frequent word that talked by the speaker we can use graph to specified that
And also we can use a wordcloud to easier our findings, where the left sides is from luma and the right sides from karen
From the data, we can see that there are several word that most frequently said by the speaker, for luma for instance we can see that the most frequents word are
People
One
Kids
And for karen the most frequent words are
People
Religion
Religious
If we further see, there are same words that spoke by each speaker, we can also compare it as it shown by below graphs
From the data below, we can see that there are several words that are spoken by the two of the speakers, such as people and one is one of the two words that spoke frequently by both of the speakers
From the data we can also make a sentiment analysis to richen our findings, as the sentiment analysis are make, we find as shown below
For Luma Muflek
## # A tibble: 2 × 2
## Sentiment Frequency
## <chr> <int>
## 1 negative 55
## 2 positive 34
For Karen Armstrong
## # A tibble: 2 × 2
## Sentiment Frequency
## <chr> <int>
## 1 negative 60
## 2 positive 49
Next, we can compute the log odds ratio of the data which will result below
## # A tibble: 10 × 7
## Sentiment `Luma Mufleh` `Karen Armstrong` OR log_or Ci_lower Ci_upper
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 sadness 39 29 1.57 0.449 0.952 -0.0537
## 2 fear 44 34 1.51 0.414 0.886 -0.0577
## 3 negative 57 49 1.36 0.310 0.722 -0.101
## 4 disgust 17 16 1.20 0.184 0.882 -0.514
## 5 joy 28 33 0.953 -0.0484 0.476 -0.573
## 6 anger 24 29 0.929 -0.0737 0.487 -0.634
## 7 anticipation 29 38 0.849 -0.163 0.342 -0.669
## 8 positive 67 95 0.748 -0.291 0.0601 -0.642
## 9 surprise 18 27 0.743 -0.297 0.317 -0.911
## 10 trust 38 56 0.738 -0.303 0.135 -0.742
From there we can use data visualization as shown below
From there we can see that Luma Mufleh tends to use negative words or she tends to use more negative sentiments word compared with Karen Armstrong, from the sentiment analysis, we can say that, 66% of the meaning words that said by Luma Mufleh have negative sentiment, and 55% words that used by Karen Armstrong are have a negative sentiments.
From the data also we could see that the positive ratio means the sentiment are more likely to accure, on the other hand negative log odds ratio shows opposite, from the data we could see that from both speaker negative words are more likely to accure
The conclusion of the analysis as we can list below:
Luma Muflek are tends to used more negative words compared to Karen Armstrong
There is one words that most spoke by the two speakers where the words is ‘People’
From the logg odds ratio, negative words are more likely to accure